Relative abundance data can misrepresent heritability of the microbiome

Background Host genetics can shape microbiome composition, but to what extent it does, remains unclear. Like any other complex trait, this important question can be addressed by estimating the heritability (h2) of the microbiome—the proportion of variance in the abundance in each taxon that is attributable to host genetic variation. However, unlike most complex traits, microbiome heritability is typically based on relative abundance data, where taxon-specific abundances are expressed as the proportion of the total microbial abundance in a sample. Results We derived an analytical approximation for the heritability that one obtains when using such relative, and not absolute, abundances, based on an underlying quantitative genetic model for absolute abundances. Based on this, we uncovered three problems that can arise when using relative abundances to estimate microbiome heritability: (1) the interdependency between taxa can lead to imprecise heritability estimates. This problem is most apparent for dominant taxa. (2) Large sample size leads to high false discovery rates. With enough statistical power, the result is a strong overestimation of the number of heritable taxa in a community. (3) Microbial co-abundances lead to biased heritability estimates. Conclusions We discuss several potential solutions for advancing the field, focusing on technical and statistical developments, and conclude that caution must be taken when interpreting heritability estimates and comparing values across studies. Video Abstract Supplementary Information The online version contains supplementary material available at 10.1186/s40168-023-01669-w.


S1 Approximating the heritability when based on relative abundances
We here derive an analytical approximation of the heritability that one obtains when using relative abundances, based on a quantitative genetic framing of absolute abundances.According to this framing, the absolute abundance of taxon i in host individual j can be written as: α i is the average absolute abundance of taxon i, G ij is a host genetic contribution (i.e. a breeding value; for simplicity, we assume no genetic dominance or epistasis), E ij is an environmental (residual) contribution, and we assume no G × E interactions.
Distribution of a taxon's absolute abundance In a population of hosts, the absolute abundance of each taxon i (Y ) is distributed as: where α i is the average absolute abundance, and V P i is the total variance, calculated as the sum of the genetic and environmental variance (i.e.V P i = V Gi + V Ei ; assuming a zero G × E covariance).

Distribution of total community absolute abundance
The absolute abundance of the entire community consisting of M microbes, is also a normally distributed variable which we call X.The average of X is simply the sum of the average abundances of each of M microbes: E[X] = M j=1 α j .The variance in X depends both on the variance in abundance of each microbe, as well as on the covariance between each pair of microbes: (Eq.S3) Where cov P (j, k) is the phenotypic covariance between taxa j and k, which is non-zero when microbial abundances covary.This can also be written as the sum of the genetic and environmental covariances (cov G and cov E ).

Distribution of relative abundance
Let us now focus on how the relative abundance of taxon i is distributed across host individuals, based on variables X and Y .The relative abundance (fraction f P i ) is calculated as the absolute abundance of focal microbe i (Y ), divided by the entire community abundance (X), and therefore is distributed as: (Eq.S4) Eq. S4 gives the total distribution of relative abundances.Similarly, we can also obtain how relative abundances of taxon i vary across host genotypes, by replacing all V P by V G , and considering the genetic covariances (cov G ) between each pair of microbes: We are interested in quantifying the variance in f P i and in f Gi , as this is the total, and the genetic variance, respectively, in relative abundances of taxon i.The proportion of the variance in relative abundance explained by host genetic variation (i.e. the heritability based on relative abundances, from now on called φ 2 i ) is then: Calculating the variance in relative abundance To calculate var(f Gi ) and var(f P i ), we use an approximation of the variance of the ratio between two dependent normally distributed variables (Hayya et al., 1975).The variance in Y X , provided that both terms are normally distributed (with mean µ and variance σ 2 ), can be approximated as: x (Eq.S7) (Hayya et al., 1975).Here, ρ is the correlation between X and Y , which we can also write as: Since, in our case, we can write X as a function of Y (i.e.X = Y + j̸ =i N (α j , V P j )), the covariance between We now introduce the following 'microbiome community' properties: Eq. S10 -Eq.S16 describe the total average abundance of all non-focal microbes (i.e.excluding taxon i) (A); the total variance in absolute abundance of all non-focal microbes (z); the total genetic variance in absolute abundance of all non-focal microbes (ω); the sum of the genetic covariances between focal microbe i and each of the other community members (γ); the sum of the environmental covariances between focal microbe i and each of the other community members (ϵ); the sum of the genetic covariances between each pair of background community members (κ); and the sum of the environmental covariances between each pair of background community members (ν).Doing this has the advantage that we no longer need to consider each of the non-focal microbes explicitly; instead, it is sufficient to know their total effects in terms of abundance and (co)variances.Combining Eq.S7 with the above equations, we can now approximate the total variance in the relative abundance of taxon i as: For readability, we have removed subscript i from α, V G and V P .We can rewrite Eq.S17 as: Similarly, the variance in relative abundance due to genetic variation can be written as: Finall, dividing Eq.S19 by Eq.S18, we obtain the heritability based on relative abundances: S2 Numerical simulation results

S2.1 Microbiome datasets simulation
In order to compare our analytical results to numerical results, we simulated populations of hosts and their microbiomes.For simplicity, we assumed asexual hosts with known genotype identities.Each taxon i out of a total of M taxa can be considered as a taxon from the same taxonomic level (e.g.OTU or genus).We did not simulate taxonomic relationships between microbes, as this is irrelevant for the purpose of our study, however, we note that studies differ in the taxonomic classification levels they consider (Table 1 of the manuscript).
We assigned each taxon an average abundance α and set the standard deviation in absolute abundance proportional to its abundance ( √ V P = α/6) to ensure that values do not become negative.We assigned each taxon a heritability (h 2 ).
For each taxon i in host individual k that belongs to genotype j, we then obtained a microbial (absolute) abundance (P ijk ), following a standard quantitative genetic framing: G ij is the genetic effect for microbe i and host j (i.e. a breeding value; varying across genotypes), and E ijk is a residual (environmental) contribution.
Genetic and environmental contributions were sampled from multivariate normal distributions.
Here, K G and K E are the genetic and environmental variance-covariance matrices, respectively.On the diagonal of these matrices are the genetic and environmental variances (V G and V E ) for each microbe i, respectively, calculated as: To test how genetic and environmental correlations between microbes affect heritability estimates, we set genetic and environmental covariances in abundance on the off-diagonals.The covariance between microbe i and microbe m was calculated as: Here, r G and r E are the genetic and environmental correlations, respectively, which we varied across simulations.
Finally, per host, we rescaled all abundances to express them as relative abundance by dividing each abundance by the sum of all M microbial abundances in host genotype j, individual k:

S2.2 Heritability estimation
We used these simulated relative abundances f to fit a linear mixed effects model, per microbial taxon, with host genotype as a random effect: f i ∼ 1 + (1|Genotype).This structure matches fully with the simulation structure; there are no confounding factors; and all abundances are normally distributed.With a sufficiently large data set and when using the true (absolute) abundances as the response variable, we thus retrieve the true coefficients and variance components.
The estimated random effect variance of host genotype measures the variation due to host genetics V G .We calculated heritabilities by dividing V G by the total variance: h 2 = V G /V P .Note that as we included only additive genetic effects in our simulations, the broad and narrow sense heritability are identical in our case.

S2.3 Problem 1: Interdependency between taxa can lead to unprecise heritability estimates
In Figure 1 of the manuscript, we illustrate how the interdependency between relative abundances can lead to unprecise heritability estimates.For the numerical results in Figure 1c (crosses), we simulated datasets with the following properties.We simulated a microbiome community consisting of 101 taxa (1 focal, and 100 background community members).The average abundance of the focal taxon was either 1, 10, 100 or 1000 (x-axis in Figure 1 of the manuscript), and each community taxon had an average abundance of 1.The focal taxon had a heritability of 0.2, and all background members shared the same heritability (either 0, .2,.5 or 1, see colors in Figure 1 of the manuscript).We simulated microbiome communities for 500 different host genotypes × 1000 replicated hosts within each population.We calculated heritabilities by fitting a mixed effects model (as explained in Section S2.2).

S2.4 Problem 2: Large sample size leads to high false discovery rates
In Figure 2 of the manuscript, we show how larger sample sizes increase the chance that a non-heritable microbe (wrongly) appears significantly heritable.In a community of microbes, this can lead to a great overestimation of the proportion of microbes that is heritable.To illustrate this, we use simulations where we varied the sample size (i.e.number of hosts) and the true proportion of heritable microbes.
When varying sample size, we kept the total number of different host genotypes constant at 50.Having 50 grouping levels is well above the recommended number of levels needed to get reliable random effect variance estimates (Harrison et al., 2018;Oberpriller et al., 2022).Results are thus not driven by error due to limited random effect levels.To vary sample size across simulations, we varied the number of replicated hosts within each genotype between 2 and 1000.This generates datasets on 50 -50,000 hosts, well covering the range of real-world microbiome datasets used for heritability estimation (Table 1 in the manuscript).
We varied the proportion of microbes that is heritable between 0 and 1, keeping the total number of microbes at 100.For all non-heritable microbes, we set V G = 0.For microbes with a non-zero heritability, we drew values from a uniform distribution, so that for each microbe, 0 < h 2 < 1.
Per simulated dataset, we then estimated the heritability of each taxon by fitting a mixed effects model as explained in Section S2.2.The significance of the estimated h 2 was assessed by comparing the likelihood of this mixed effects model, with that of a simpler model without host genotype.We corrected for multiple testing using the Benjamini Hochberg approach, setting the false discovery rate (FDR) to 0.1.This procedure matches the methods described in a recent study estimating microbiome heritability in baboons (Grieneisen et al., 2021).
The accuracy of identifying the number of heritable microbes was calculated as the F-score, which is the harmonic mean of the precision and recall: Precision = TP TP + FP (Eq.S29) Recall = TP TP + FN (Eq.S30) (TP: True Positives, FP: False Positives, FN: False Negatives).When precision=1, there are no false positives; when recall=1, there are no false negatives.The F-score equals 1 when both precision and recall are 1, implying maximal accuracy.
Under low sample sizes, accuracy in estimating the proportion of heritable microbes is low due to a lack of power to detect significant effects, resulting in type 2 errors (i.e.false negatives), reflected by a recall that is <1 (Fig. S1).
However, under large sample sizes, accuracy is also strongly reduced (Fig. S1).This is due to a strong reduction in precision (i.e. an increase in false positives), leading to a serious overestimation in the number of heritable microbes (Fig. S1).The reduction in accuracy is strongest when only a small proportion of the microbes is truly heritable; with the largest dataset that we simulated (consisting of 50 genotypes × 1000 replicated hosts within each genotype = 50,000 samples), the estimated proportion of heritable microbes was about 90%, while in (simulated) reality, only 10% of the microbes was heritable.
When using the (true) absolute abundances, the problem of low statistical power with small data sets (unsurprisingly) remains, but precision does not decrease with sample size, resulting in very accurate estimates of the overall microbiome heritability (Fig. S1).At the core of this issue is the interdependency of relative abundances: a host genetic effect in some microbes, will also create genetic differences in relative (but not absolute) abundances for other, non-heritable microbes.This leads to consistent differences in relative abundance among host genotypes, even if very small, and this is precisely what results in an estimated non-zero genetic variance, appearing significant with enough statistical power (i.e. with enough hosts).

S2.5 Problem 3: Microbial co-abundances lead to biased heritability estimates
In Figure 3c-e of the manuscript, we added numerical results (crosses) to our analytical results for a comparison.
For these numerical results, we simulated 500 host genotypes and 500 replicated hosts within each genotype.We simulated a microbiome community consisting of 101 microbes (1 focal taxon and 100 background community taxa).We varied both the heritability of the focal taxon (x-axis in Figure 3c-e) and the heritability of each of the background community members (colors in Figure 3c-e).A non-zero genetic correlation creates correlations between breeding values (see Eq. S26), whereas a non-zero environmental correlation creates correlations at the individual host level (Eq.S27).We used the same correlation strength for each pair of microbes to create the covariance matrix.

Relative abundances
Figure S1: Accuracy (measured as F-score, see Eq. S31) of the estimated percentage of heritable microbes when using relative (upper row) and absolute (bottom row) abundances.The total number of microbes is set to 100 in these simulations.On the left: Accuracy (purple: low; green: high) as a function of sample size (x-axis) and the true percentage of heritable microbes (y-axis).Middle graphs: With small datasets, the true number of heritable microbes (dashed line) is generally underestimated (solid lines below the dashed lines to the left of the plot), due to a lack of statistical power, both when using absolute and relative abundances.Under high sample sizes, the number of heritable microbes is greatly overestimated when using relative abundances, but not when using absolute abundances (solid lines above dashed lines to the right of the plot on the upper panel).On the right: Accuracy split into effects through recall (Eq.S30), and precision (Eq.S29).Recall is the ratio between the number of true positives and the true number of heritable microbes.When recall < 1, this indicates the presence of false negatives.
Precision is the ratio between the number of true positives and the total number of significant results.A value lower than 1 indicates that there are false positives.Under a high sample size and when using relative abundances, precision is greatly reduced, indicating an increase in false positives.All value are median values of 10 simulations.

S3 Power analysis
To calculate the chance that a non-heritable microbe wrongly appears significantly heritable, we performed a power analysis using R-package simr (Green and MacLeod, 2016).To this end, we used the function makeLmer() to build an artificial lme4 object.Here, the fixed intercept was the average relative abundance of our focal taxon, calculated as α α+100 (we tested different α values, see colored lines in Figure 3 of the manuscript).We set the slope to 0. The random effect variance (V G ) equaled the genetic variance in relative abundance (var(f Gi ); Eq.S19), and the residual variance was calculated as: var(f Ei ) = var(f P i ) − var(f Gi ) (see Eq. S19, Eq.S18).We created a dataset with 25 host genotypes, and x replicates within each host genotype.We used the function powerCurve() to calculate the power at a range of x values (x-axis in Figure 3 of the manuscript shows total sample size, i.e. x × 25).We tested the significance of the random effect by setting the argument test=random(), which performs a likelihood ratio test on model with and without genotype as random effect.We performed 1000 simulations at each sample size, and used α = 0.05 as a significance level.

S4 Summary of published results
Table 1 of the main text summarizes results from empirical studies that estimate the heritability of individual taxa, for different host systems, populations and tissues.Below, we give some detailed information on each of the studies, and how we arrived at the numbers shown in Table 1.Where possible, we analyzed the provided heritability effect sizes and p-values, and use the same methodology to calculate the overall proportion of heritable microbes, in order to make results as comparable as possible.We note that the resulting estimates do not necessarily correspond to the results that are reported in the original study (for instance, some studies do not report the overall proportion of significant results, or use no or a different method to correct for multiple testing).

S4.1 O'Connor et al. (2014)
Host: Mice (gut microbiome) Samples: 32 (8 outbred mice strains, 2 cages/strain, 4 animals/cage; divided over two diets.Only control diet is used for microbiome analyses) Host genetic information: Lineage Taxa: 43 taxa (genus level) Transformation: Total Sum Scaling Model: Heritabilities are estimated using inbred strain analysis, using relative abundances as the response variable, and no other covariates.Heritability is calculated as: (0.5 * V a )/(0.5 * V a + V e ) Results: Heritability estimates range between 0.19 and 0.84 (average=0.47).We obtained these numbers by digitizing Table 1 in  Results: None of the taxa is significantly heritable (note that no information on significance test is provided).

S4.3 Davenport et al. (2015)
Host: Humans (gut microbiome) Samples: 93, 91 and 127 (for winter, summer, and both seasons combined, respectively) Host genetic information: SNPs Taxa: 16, 104 and 102 (for winter, summer, and both seasons combined, respectively.From genus to phylum level, including taxa that were present in at least 75% of the individuals, and excluding taxa that were highly correlated with a lower level or same level taxon (r ≥ 0.9)).
Transformation: Quantile normalization Model: Abundance data were collected for two seasons (summer and winter).For each season and per taxon, a quantile normalization was performed.For the analyses for both seaons combined, values were averaged for individuals that were sampled in both seasons.Multiple regression with age, sex and date of collection on these normalized abundances was performed.Residuals were used to estimate the 'chip heritability' using GEMMA software based on ∼200k SNPs.Results were considered significant if standard errors did not overlap with 0.

Transformation: Box-Cox transformation
Model: Same method as in S4.17, include only individuals aged >12.Covariates: age, the number of sequences per sample and sex.

S4.6 Wright et al. (2021)
Host: Humans (vaginal microbiome) Samples: 244 and 88 (European and African ancestry, respectively.Both MZ and DZ twin pairs) Host genetic information: Twins Taxa: 3 taxa (present in >20% of the samples and with a within-host proportion of >10%) Transformation: Arcsine square root transformation on relative abundances Model: OpenMx R-package was used to estimate heritabilities, for each self-identified ancestral group separately.
Alpha level of 0.05 was used to assess significance.
Results: European ancestry: 1 taxon is significantly heritable (estimate: 0.35).African ancestry: no significant effects.Host genetic information: Pedigree Taxa: 166 (excluding taxa with residuals Kurtosis ¿ 0.8, results in 249 taxa from phylum to genus level.Redundant taxa were then removed, leaving 166 taxa for the analyses) Transformation: Inverse normal transformation Model: Sequential Oligogenic Linkage Analysis Routines (SOLAR) was used to estimate heritabilities, adjusting for age and sex.Correct for correlations among taxa and multiple testing following Gao et al. (2010Gao et al. ( , 2008)).

Host genetic information: SNPs
Taxa: 110 (families, >10% prevalence) Transformation: Total Sum Scaling Model: R-package Sommer was used to fit multivariate linear mixed models, using a kinship matrix based on the SNPs.Results was considered significant if standard errors did not overlap with zero.

Host genetic information: SNPs
Taxa: 512 (OTUs, across 11 prokaryotic orders, present in > 50% of the animals within each of the seven farms studied) Transformation: Quantile normalization Model: Genetics Complex Trait Analysis (GCTA) were used to calculate the variance in quantile normalized data that is explained by SNPs.Farm, dietary components and first five PCs of genetic relatedness matrix were used as covariates.Heritability confidence intervals were estimated based on software FIESTA, and p-values were corrected with the Benjamini-Hochberg approach (FDR<0.05).
Results: For Holstein-Friesian breed, 39/512 (7.6%) was heritable (ranges: 0.2-0.6).For Nordic-Red breed, 3/512 (0.59%) was heritable.Note that as the study does not provide p-values for each of the 512 taxa, we could not calculate the number of significant results when setting the FDR to 0.1, to make it more consistent with the other results.Model: An ACE model was fitted using R-package mets to calculate heritabilities, controlling for sex and age.

S4.13
Host genetic information: Pedigree Taxa: 1678 OTUs Transformation: Total sum scaling Model: OTU counts (transformation not mentioned, assuming relative abundances are used, i.e. total sum scaling) were used in a sire model.Sex, sire, dam and contemporary group were included as fixed effects; sire and pen were included as random effects.Heritability calculated as: (4V sire )/(V sire + V pen + V E ).Significance level is not provided; we assume that heritabilies are considered significant when standard errors do not overlap with 0.
Host genetic information: Lineage Taxa: 792 and 2557 OTUs (2010 and 2015 field study, respectively.Present in >80% of the samples) Transformation: Log-transformation on relative abundances Model: Mixed effects models (using lme4) were fitted on the log-transformed relative abundances.Sampling depth was included as fixed effect, and week, location and inbred line were included as random effects.Broad-sense heritability was calculated as the proportion of variance explained by inbred line.To obtain p-values, a permutation analysis (permuting abundances across all samples, 5000 times) was performed.Heritabilities were considered significant if p≤0.001.
Results: For the 2010 field study: 143/792=18% OTUs are significantly heritable (with p<0.001).Estimates of significant taxa range between 0.15-0.23,with an average of 0.17 (calculated using Dataset S4 in Walters et al. (2018)).For the 2015 field study: 5/2557 (0.2%) of the OTUs are heritable (p < 0.001), with estimates ranging between 0.41 and 0.53 (mean: 0.45) (calculated using Dataset S4).Note that only 200 the OTUs with highest heritability are provided in Dataset S4, so we could not correct p-values using Benjamini Hochberg approach, and assess the significance using the same approach as done in other studies.
O'Connor et al. (2014).No significance or uncertainty measures are provided.S4.2 Zhao et al. (2013) Host: Chickens (gut microbiome) Samples: 56 (consisting of males and females, from a low high and low body weight selection treatment) Host genetic information: Pedigree Taxa: 23 OTUs (from which 13 belong to Lactobacillus) Transformation: Log-transformed and scaled Model: Wombat software was used to obtain heritability estimates.No information on any other covariates is given, nor in how significance was assessed.

S4. 7
Xie et al. (2016) Host: Humans (gut microbiome) Samples: 250(35 MZ and 92 DZ twin pairs).Host genetic information: TwinsTaxa: 109 genera (include when detected in >50% of the samples) Transformation: Box-Cox transformation Model: ACE model, implemented in OpenMx software, was fitted on the on Box-Cox transformed (1 as offset) values, correcting for sequencing amount, age and age started living apart.Obtain p-values by comparing ACE model with CE model, and correct for multiple testing according to Storey's FDR method.Results: 11/109 of the genera showed a significant heritability (p-value < 0.1) (10%).Heritability estimates of individual taxa are not provided.S4.8Turpin et al. (2016) Host: Humans (gut microbiome) Samples: 270 (from 123 families with at least 2 individuals; this is a subset of the total sample size of 1,561 individuals).

S4. 12
Si et al. (2017) Host: Humans (vaginal microbiome) Samples: 542 (222 MZ twins, 56 DZ twins, 102 family members, 80 related females and 82 unrelated females) Host genetic information: Twins Taxa: 369 (OTUs and higher taxonomic levels, included when present in > 50% and 15% of the individuals, respectively)Transformation: Inverse normal transformationModel: A linear regression was fitted on the inverse normally transformed abundances, correcting for age, menopause, hormone therapy, bacterial vaginosis, HPV, BMI, waist-hip ratio, OTU counts per sample and sequence run.Sequantial Oligogenic Linkage Analysis Routine (SOLAR) was then used to estimate heritabilities.The study confirms heritability estimates using OpenMx software.

S4. 21
Goodrich et al. (2016) Host: Humans (gut microbiome) Samples: 3261 samples (2731 different individuals including 489 DZ twin pairs and 637 MZ twin pairs.530 individuals with at least two samples).Host genetic information: Twins Taxa: 945 taxa (from which 782 OTUs; include taxa that are present in >50% of the samples) Transformation: Box-Cox transformation Model: Use Box-Cox transformed (offset of 1) relative abundances in multiple regression, with the following covariates: number of 16s rRNA sequences per sample, age, gender, shipment date, collection method, and technician identity.Residuals were used in ACE model, implemented in OpenMx software.Significance of host genetics was determined by comparing with CE model (without host genetics), based on a likelihood ratio test.For samples